Python: Split type checkers by target (pyright source, 5 checkers on tests/samples) by eavanvalkenburg · Pull Request #6443 · microsoft/agent-framework

eavanvalkenburg · 2026-06-10T08:39:38Z

Motivation and Context

Following the "too many type checkers"
approach, this reworks our typing setup so that each target is checked by the
right tool(s):

The source ran two strict checkers (mypy + pyright), doubling the
annotation burden and producing checker-specific friction on internal code.
Tests and samples were under-checked, so type problems in the way customers
actually call the public API only surfaced downstream.

The goal: make source development easier (one source checker) while giving
customers more confidence that running any of the five common type checkers
against their own code will surface fewer surprises.

Description

Source: Pyright (strict) is now the sole source-code type checker.
MyPy is removed from source; its [tool.mypy] block becomes a relaxed
profile used only for tests/samples.
Tests: checked by all five checkers — pyright (relaxed), mypy, pyrefly,
ty, and zuban.
Samples: checked by pyright, pyrefly, and ty (mypy/zuban can't resolve
script-style sample layouts).
The relaxed/basic profiles intentionally disable noisy rules (private-access,
not-required TypedDict access, untyped test bodies) so test/sample authors
aren't forced into ugly over-annotation. Narrow, rule-specific ignores
(# pyright: ignore[rule], # type: ignore[code]) are used only where a
checker is pedantic.
Added pyrightconfig.tests.json; bumped sample pyright configs to basic.
Added pyrefly.toml, pyrefly.samples.toml, ty.samples.toml.
Unified test/sample typing onto the same parallel fan-out used by source
pyright (run_command_items in scripts/task_runner.py) for consistent
execution.
Made version-conditional imports symmetric: the # type: ignore is now kept
or dropped on both branches, so results match across interpreter versions
(local vs CI) instead of only matching the current venv.
Updated SKILL.md, DEV_SETUP.md, and CODING_STANDARD.md for the five
gating checkers and pyright over source + tests + samples.

Contribution Checklist

The code builds clean without any errors or warnings
The PR follows the Contribution Guidelines
All unit tests pass, and I have added new tests where possible
Is this a breaking change? If yes, add "[BREAKING]" prefix to the title of the PR.

Copilot

Copilot wasn't able to review this pull request because it exceeds the maximum number of files (300). Try reducing the number of changed files and requesting a review from Copilot again.

…tests/samples) Rework the typing setup along the lines of the 'too many type checkers' approach: - Pyright (strict) is now the sole source-code type checker; mypy is removed from source and its [tool.mypy] block becomes a relaxed profile used only for tests/samples. - Tests are checked by all five checkers (pyright relaxed, mypy, pyrefly, ty, zuban); samples by pyright, pyrefly, and ty. All run in a relaxed/ basic profile so authors aren't forced into over-annotation. - Add pyrightconfig.tests.json and bump sample pyright configs to basic. - Unify test/sample typing onto the same parallel fan-out used by source pyright via run_command_items in task_runner.py. - Make version-conditional imports symmetric: keep or drop the '# type: ignore' on both branches so results match across interpreter versions (local vs CI). - Update SKILL.md, DEV_SETUP.md, and CODING_STANDARD.md for the five gating checkers and pyright on source+tests+samples. Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>

Merging main into the type-checker split branch surfaced regressions that the new five-checker test suite and unit tests caught: Runtime fixes: - anthropic: restore the dropped `cache_read_input_token_count` mapping in _parse_usage_from_anthropic (lost during merge conflict resolution). - gemini: _get_function_calling_mode test helper returned str(enum) ('FunctionCallingConfigMode.AUTO') instead of the enum value ('AUTO'). - openai: _response_id_from_token test helper was an infinite self-recursion; return token['response_id']. - orchestrations: reset output_events per approval iteration so the terminal output assertion counts only the final run. - core: drop a stale duplicate harness test whose message ('non-negative') contradicted the source ('positive'). - purview: import PolicyLocation/PolicyScope/ProtectionScopeActivities/ ExecutionMode used by the processor tests. Type-checker fixes (tests, relaxed profile): - core: pyright/mypy/pyrefly/ty/zuban green-ups across the harness, MCP, observability and types tests. - anthropic/openai: route provider-namespaced UsageDetails keys through a dict cast (extra_items TypedDict unsupported by mypy/ty). - purview: typed model constructors and cache-mock casts. - ag-ui: annotate WorkflowContext[Any, Any] so yield_output accepts test payloads, guard Optional forwarded_props, and ty-ignore intentional bad args. Source pyright (sole source checker) flagged unnecessary ignores newly introduced by merged code in core _tools.py and declarative _declarative_base.py. Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>

The parallel test-typing fan-out runs many mypy processes concurrently, all defaulting to a single shared ./.mypy_cache. Concurrent writes corrupt the cache and mypy aborts with INTERNAL ERROR (intermittently, depending on worker timing) -- which is why CI's Test Typing job failed on a shifting set of packages while a single-package run was fine. Give each mypy invocation an isolated cache dir keyed by its target paths so incremental caching still works per package without races. Other checkers (zuban/pyrefly/ty/pyright) maintain their own caches and are unaffected. Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>

Lab was the last package still running mypy on its source code, requiring mypy-only `# type: ignore` comments that pyright (the sole source checker everywhere else) flags as unnecessary. Align lab with the rest of the monorepo: - Remove the lab source mypy poe tasks (mypy-gaia/lightning/tau2) and the now-dead strict [tool.mypy] config block. - Drop the 'Run lab mypy' CI step; lab source is type-checked by pyright only. Lab tests remain covered by the workspace test-typing fan-out (mypy, pyrefly, ty, zuban, pyright over tests using the relaxed root config). Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>

A fresh merge from main brought in new test code never run under the five-checker test-typing suite. Green up across the affected packages: - core: narrow Optional span.attributes with 'and' guards in span filters and assert+cast the json.loads(...attributes[...]) reads (test_observability); match the existing as_agent ignore on the protocol-typed fixture (test_clients). - openai: align new streaming tests with the established chat_options dict pattern (ChatOptions TypedDict isn't assignable to dict), route Optional .annotations[0] access through a small _first_annotation helper (mirrors the file's assert-not-None convention), and annotate a mapped ResponseStream. - foundry_hosting: annotate error: dict[str, Any] = body.get(...) or {} (zuban needs the annotation). - foundry: narrow ignores for the live AIProjectClient credential arg (pyrefly) and connections.get_default (zuban) SDK type gaps. Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>

Copilot AI review requested due to automatic review settings June 10, 2026 08:39

Copilot AI reviewed Jun 10, 2026

eavanvalkenburg force-pushed the typing_exp branch from f2be487 to 31c4b66 Compare June 10, 2026 08:40

moonbox3 added documentation Improvements or additions to documentation python Issues related to the Python codebase lab labels Jun 10, 2026

eavanvalkenburg force-pushed the typing_exp branch from 31c4b66 to 30ee0b6 Compare June 15, 2026 14:00

eavanvalkenburg temporarily deployed to integration June 15, 2026 14:00 — with GitHub Actions Inactive

eavanvalkenburg temporarily deployed to integration June 15, 2026 14:45 — with GitHub Actions Inactive

eavanvalkenburg temporarily deployed to integration June 15, 2026 15:02 — with GitHub Actions Inactive

eavanvalkenburg temporarily deployed to integration June 15, 2026 15:11 — with GitHub Actions Inactive

eavanvalkenburg marked this pull request as ready for review June 15, 2026 17:11

eavanvalkenburg requested a review from a team as a code owner June 15, 2026 17:11

moonbox3 removed the lab label Jun 17, 2026

eavanvalkenburg and others added 4 commits June 17, 2026 14:36

eavanvalkenburg force-pushed the typing_exp branch from a1f1cb3 to c4f767f Compare June 17, 2026 12:36

moonbox3 added the lab label Jun 17, 2026

eavanvalkenburg enabled auto-merge June 17, 2026 14:58

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Python: Split type checkers by target (pyright source, 5 checkers on tests/samples)#6443

Python: Split type checkers by target (pyright source, 5 checkers on tests/samples)#6443
eavanvalkenburg wants to merge 5 commits into
microsoft:mainfrom
eavanvalkenburg:typing_exp

eavanvalkenburg commented Jun 10, 2026

Uh oh!

Copilot AI left a comment

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

3 participants

Conversation

eavanvalkenburg commented Jun 10, 2026

Motivation and Context

Description

Contribution Checklist

Uh oh!

Copilot AI left a comment

Choose a reason for hiding this comment

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

3 participants